Author: Jesper Jørgensen
Posted: 1/17/2006 10:00:00 AM
Designing websites in multiple languages requires special attention if you want to make the most of search engines like Google that crawl your website from an external url. Knowledge about how webcrawlers work will improve the chance that you get it right in the first shot.
Unless you have special knowledge about specific web crawlers and only want to optimize your site for this engine, you should assume the following as facts:
- Webcrawlers do not perform FORM posts.
- Webcrawlers do not execute javascript.
- Webcrawlers do not keep session state.
- Webcrawlers only keep one index of a particular URL.
- Info: Webcrawlers navigate only tags of the <A href=""/> style
Let’s say your website www.yoursite.com shows up by default in American English, but you have a dropdown box where you can switch to French and German. Imagine you navigated to www.yoursite.com/news.html. After selecting for example German, you are still on the same page, but because your site logic saved German in a session variable, your page now is shown in German.
Now imagine what will happen when you submit www.yoursite.com to Google, and ask it to search your website. Your website will now only be indexed in American English for the following reasons:
- The crawler is not able to look beyond the formpost (or javascript) needed to submit the value from your dropdown box.
- Even if it got past the dropdown, it would not be able to keep the session and would immediately forget the setting for German, and would see all other pages in American.
Now assume that instead of the dropdown box, you just add images with country flags to the page, and have them link to for example www.yoursite.com/news.html?language=fr*. You would probably still have to save the setting in a session variable to see the rest of the site in French. How would it look to the webcrawler?:
- It would now probably see the news.html page in French, and due to this, probably, index it in French, but when crawling to for example www.yoursite.com/products.html, it would lose the session, and see this page in American.
*) This is just an example; Sitecore standard syntax for setting the language may be different.
So what do you do? Well, a little trick that I didn't really think of earlier, though it may be worth a try, is as follows:
- On all pages, add invisible links to a language version of the page for each of the non-default languages. For example: <A href="/SdnArchive/news.html?language=fr">.</A>, and the same for German in our example. In this way, the crawler would see a languaged version of each page.
The neat solution: That would actually be to have domain name for each language. For example: www.yoursite.fr for the French language. Now your site should just show up in the language for the corresponding domain, without having to use a session variable. A fine example of this is www.nilfisk-advance.dk that shows up in Danish and www.nilfisk-advance.pt that shows up in Portuguese. If you search Google for "Na Nilfisk-Advance ocorrem coisas", www.nilfisk-advance.pt will show up as the first (and only) link. And when you have been reading this article, that may just be what you want to happen with your multiple languages. The reasons this works so nicely are:
- Google just need an initial link for the root of the site (www.nilfisk-advance.pt) to crawl the site in this language.
- No session is needed to remember the language.
- Each page has a unique URL in the different languages because the domain name is unique.
Therefore www.nilfisk-advance.pt/Info/News.html and www.nilfisk-advance.dk/Info/News.html are seen as different pages by Google, and indexed individually.
Prev Next